Learning Chinese Word Representations From Glyphs Of Characters
نویسندگان
چکیده
In this paper, we propose new methods to learn Chinese word representations. Chinese characters are composed of graphical components, which carry rich semantics. It is common for a Chinese learner to comprehend the meaning of a word from these graphical components. As a result, we propose models that enhance word representations by character glyphs. The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings. Another contribution in this paper is that we created several evaluation datasets in traditional Chinese and made them public.
منابع مشابه
Hantology-A Linguistic Resource for Chinese Language Processing and Studying
Hantology, a character-based Chinese language resource is created to provide an infrastructure for language processing and research on the writing system. Unlike alphabetic or syllabic writing systems, the ideographic writing system of Chinese poses both a challenge and an opportunity. The challenge is that a totally different resources structure must be created to represent and process speaker...
متن کاملExploiting Word Internal Structures for Generic Chinese Sentence Representation
We introduce a novel mixed characterword architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomi...
متن کاملImproved Learning of Chinese Word Embeddings with Semantic Knowledge
While previous studies show that modeling the minimum meaningbearing units (characters or morphemes) benefits learning vector representations of words, they ignore the semantic dependencies across these units when deriving word vectors. In this work, we propose to improve the learning of Chinese word embeddings by exploiting semantic knowledge. The basic idea is to take the semantic knowledge a...
متن کاملItem-specific and generalization effects on brain activation when learning Chinese characters.
Neural changes related to learning of the meaning of Chinese characters in English speakers were examined using functional magnetic resonance imaging (fMRI). We examined item specific learning effects for trained characters, but also the generalization of semantic knowledge to novel transfer characters that shared a semantic radical (part of a character that gives a clue to word meaning, e.g. w...
متن کاملDeep Learning for Chinese Word Segmentation and POS Tagging
This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance ...
متن کامل